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Abstract — Application-layer multicast implements the 
multicast functionality at the application layer. The main goal of 
application-layer multicast is to construct and maintain efficient 
distribution structures between end-hosts. In this paper we 
focus on the implementation of an application-layer multicast 
distribution algorithm. We observe that the total time required 
to measure network latency over TCP is influenced dramatically 
by the TCP connection time. We argue that end-host distribution 
is not only influenced by the quality of network links but also 
by the time required to make connections between nodes. 
We provide several solutions to decrease the total end-host 
distribution time. 
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I, Introduction 

For several years now group communications have been 
receiving significant attention from both the industry and 
scientific communities [T], J5J. The main goal of group com- 
munication is to enable the exchange of information between 
group members that can be located across the entire globe. 

One of the main application of group communications is in 
the field of multicast. Historically speaking, the first multicast 
applications were implemented over the IP layer, also known 
as IP multicast [3]. However, after nearly a decade of research 
in the field of IP multicast, it was never fully adopted because 
of several technical and administrative issues J4J. 

Later, there have been several proposals for other multicast 
implementations that would be easier to deploy over the 
already existing and well-established Internet protocols and 
would require little or no modifications in existing routers. 
Such a survey of existing solutions was provided by El-Sayed 
et al El. 

One of the directions that has been clearly adopted over 
the last few years is application-layer multicast, which imple- 
ments the multicast functionality at the application layer. The 
main goal of application-layer multicast is to construct and 
maintain efficient distribution structures between end-hosts. 
These structures are constructed using an overlay network 
providing the necessary infrastructure for data transfer between 
end-hosts. 

Today's research focuses on the many aspects of 
application-layer multicast, including construction of overlay 
networks |6), Q, optimization issues [8| or security J5]- In 
our previous work iflOi we have addressed the problem of 



optimally distributing end-hosts (i.e. EH) to overlay network 
hosts (i.e. OH) in order to minimize network latency and to 
distribute the load of OH. Based on a heuristic algorithm we 
proved that the algorithm ensures a local optimal distribution 
of EH in real time and thus can be used to provide a feasible 
solution to the distribution problem. 

In this paper we focus on the actual deployment of the al- 
gorithm proposed in our previous work in a real and globally- 
scaled distributed system: PlanetLab ifTTI . PlanetLab is a 
"geographically distributed overlay network designed to sup- 
port the deployment and evaluation of planetary-scale network 
services" IfTTI . Using PlanetLab, researchers can test their 
algorithms and systems in a real environment where nodes 
can become unreachable, network bandwidth can fluctuate and 
node processing capabilities can drop dramatically. 

In order to test the real applicability of our previously 
proposed algorithm we have developed an overlay network 
in PlanetLab where nodes are connected in a complete graph 
model. There are several advantages for using such a graph 
model. First, there is no need for implementing complex 
routing algorithms l[T2l . which greatly simplifies the imple- 
mentation and functionality of the overlay. Second, main- 
taining routing tables is not more complex than maintaining 
connections with all the other nodes. As a downside of this 
topology, there is a large number of connections that must 
be maintained, which grows exponentially with the number of 
OH. However, the simplicity of the routing algorithms between 
OH makes this topology a great candidate for using it as a leaf 
component in hierarchical topologies lfl3ll . fl4l . 

Existing research j5), Q, |fT5l focuses on measuring the 
delay between nodes after the overlay has been constructed or 
measuring the overlay construction time after TCP connections 
are done. In deploying our algorithm we have observed that 
the total time required to measure network latency over TCP 
is influenced dramatically by the TCP connection time. In 
this paper we also argue that end-host distribution is not only 
influenced by the quality of network links but also by the time 
required to make connections between nodes. 

The paper is structured as follows. In Section III] we provide 
an overall presentation of the overlay network, we discuss 
our previous work and we identify the main problems for 
deploying the previously proposed algorithm. In Section III 
we present the measurement results that were done with nodes 



spread across 23 countries and we provide 3 solutions for 
improving the performance of the measurements. Finally, we 
conclude with an overview of the proposed solutions and we 
mention some future solutions that could also be implemented. 

II. Problem Statement 

The measurements that follow in the next sections are 
based on a complete graph overlay topology where EH are 
distributed using an heuristic algorithm. An example of such 
a topology is given in figure Fig. [T] where we have illustrated 
the presence of 3 host types: 

• End-hosts (i.e. EH); 

• Overlay-hosts (i.e. OH); 

• Monitor-hosts (i.e. MH ). 

EH are the producers and consumers of data transferred by 
the overlay containing the OH. MH are used to monitor the 
load of each OH and to distribute the connection of EH. The 
heuristic algorithm we proposed in our previous work is used 
to distribute EH to OH in order to minimize latency and to 
distribute the load of OH. 



TABLE I 
Country and OH node count 




Fig. 1. Multicast topology 

The distribution algorithm uses the measured latency be- 
tween all OH pairs, the load of each OH and the measured 
latency between each EH and OH pairs. The algorithm is run 
by the MH each time a new EH must be connected. At this 
time, the EH must provide the MH its measurement results 
on the network latency it recorded to each OH. Based on this 
data and the reported load received from each OH, the MH 
runs the distribution algorithm. 

As mentioned in our previous work, after all data is avail- 
able, the algorithm executes very fast. For instance, from the 
simulations we run, for 100 OH the algorithm execution time 
for distributing a single EH is about 3.7 ms. This execution 
time provides a real-time applicability of the proposed algo- 
rithm. 

We have chosen to deploy the proposed multicast in Plan- 
etLab because it provides globally-available network services 



Country 


Node count 


Country 


Node count 


Austria 


1 


Italy 


6 


Canada 


2 


Korea 


2 


France 


4 


Poland 


3 


Germany 


9 


Romania 


2 


Greece 


1 


Spain 


2 


Hungary 


1 


Switzerland 


1 


Israel 


1 


US 


5 



that can be used to run any application type that can run on a 
Linux OS. From the beginning of the implementation process 
we had to deal with several problems. First of all, network 
connections between PlanetLab nodes or even node CPUs 
can be heavily loaded, sometimes even leading to SYNACK 
timeouts for TCP connections. Second, nodes can be rebooted 
at anytime by PlanetLab Central coordinators in order to 
ensure a software update, for software maintenance or simply 
because of some hardware problems. These problems must 
be handled by the MH in order to ensure that EH are not 
distributed to such nodes and that already distributed EH nodes 
are redistributed if necessary (i.e. on OH failure). 

We also encountered several problems on the EH side. The 
proposed algorithm heavily relies on the measurement data 
provided by EH. This means that when joining the network, 
all EH must first measure the latency with all OH and then 
send this data to MH. The problem with this approach is that 
in some cases the response time from OH is very long, in the 
order of seconds as shown in the next sections. This leads to 
an overall distribution time in the order of seconds or even 
minutes, which is unacceptable. 

III. Measurement Issues and Solutions 

A. Overlay Construction Time 

Although the construction of the overlay is done only once, 
we consider that measuring the construction time can provide 
useful perspective of the time required to re-construct the 
overlay in possible future developments. The constructing of 
the overlay network is not made instantly. In order to evaluate 
the performance and the general usability of the proposed 
overlay, we have measured the time needed to construct the 
complete graph between the overlay nodes. 

Deploying and starting applications on PlanetLab nodes can 
be done automatically using applications such as multicopy or 
multiquery that are part of the CoDeploy project fl6l . These 
allow a parallel deployment and execution of commands on a 
set of nodes. We have considered 5 settings with a different 
number of OH nodes. The OH applications were deployed on 
nodes from 14 countries (for the maximum number of 40 OH 
nodes), as shown in Table [I] After starting the OH applications, 
each OH connects to all other OH according to Alg. [T] where 
OH corresponds to the set of OH, Cout is the set of outgoing 
connections and Cin is the set of incoming connections. 

At first, each OH starts the connection process to other OH 
nodes. Then, it waits for the connection process to complete. 



Algorithm 1 Complete connections for one OH 
Let t\ = @Get_curr_time() 
Let Cout = 4> 

{Start connection sequences} 
for all oh e OH do 

c = @Start_conn_sequence(o/i) 

Cout = Cout U {c} 
end for 

{Wait for completion} 
@Wait_for_completion( Cout ) 

{Now eliminate duplicate connections} 
Let Cin = @Get_incoming_connections() 
for all c e Cout do 

if 3c' e Cin :@Src_address(c')=@Dest_address(c) then 
(Meas out , MeaSi„)=@Run_measurements(c, c') 
if Meas ou t < Measi n then 
@End_connection(c) 
Cout = Cout \{c} 
end if 
end if 
end for 

{Calculate complete connection time} 
Let £2 = @Get_curr_time() 

Let Gxime = ^2 — t\ 



by the number of nodes. However, the variation is not linear 
because the overlay also depends on other factors such as 
the quality of network connections and the load of nodes. 
The result shown in Fig. [2] has the following explanation. 
In the first OH set (i.e. 3 nodes), all 3 nodes are located in 
European countries, with a minimum load. In the next OH set 
(i.e. 10 nodes) we have added additional nodes from Europe, 
one node from the US and one node from Asia. This almost 
doubled the graph construction time because the node from 
Asia was heavily loaded, with the CPU running at over 80% 
almost all the time. In the next set (i.e. 20 nodes) we have 
added additional nodes from Asia, Canada and Europe which, 
because of network connection latencies and heavily loaded 
nodes (i.e. from Israel and Germany) has led to a quadruple 
time. In the next two sets (i.e. 30 and 40 nodes) we have added 
additional nodes from Europe and US, leading to the results 
shown in Fig. [2] 

B. EH Connection Measurement Issues 

When EH nodes are started, each node first connects to 
all OH nodes in order to measure the network latency. The 
measured values are then sent to the MH that applies the 
heuristic algorithm developed in our previous work 00] to 
determine the OH node where each EH must connect. We 
have identified two components that significantly influence the 
measured values: connection time and network latency. Let EH 
be the set of EH. Then, the total measurement time Mi needed 
to be executed by an EH is: 




Mi = max{Conn(ehi, ohj) + CummLat(ehi, ohj)} (1) 



10 20 30 

OH node count 

Fig. 2. Complete graph construction time 



This process leads to duplicate connections between each OH 
node pair. In order to eliminate duplicate connections we 
measure the connection latency in each direction by sending a 
single package of 1500Bytes and we eliminate the connection 
with the maximum latency. 

According to Alg.[T] each OH calculates a complete connec- 
tion time Grime- The complete graph construction time is the 
maximum of these values, as shown in Fig. [2] As we can see 
in Fig. [2] the construction of the overlay is greatly influenced 



where eh* e EH, i = 1, |EH| and ohj e OH, j = 1, |OH|. 
Conn denotes the time needed to establish a connection 
between e/l, and ohj. CummLat denotes the cumulated 
round-trip latency calculated by measuring the time difference 
between sent and received packages: 



CummLat(ehi, ohj 



Lati(ehi, ohj) 
Latiiehi, ohj) 
Lat 3 (ehi, ohj) 



(2) 



where Lat\, Lati and Lat^ denote the round-trip latency of 
3 packages. 

We have considered several scenarios, with EH count rang- 
ing from 10 to 1000. EH nodes were deployed on nodes from 
23 countries (for the maximum number of 1000 EH nodes), 
as shown in Table [TTJ 

Each EH calculates its own Mj value that is sent to the MH 
that calculates an average measurement time, illustrated in Fig. 
[3] We can see that the number of OH nodes clearly influences 
the overall measurement time. There are several values that 
break the linear trajectory. For instance, in the case of 40 
OH nodes, when running 50 EH nodes the average time is 
39382ms and when running 100 EH nodes the average time is 
reduced to 21571ms. The explanation for this behavior lies in 



TABLE II 
Country and EH node count 



Country 


Node count 


Country 


Node count 


Argentina 


10 


Japan 


10 


Australia 


10 


Korea 


20 


Austria 


40 


Netherlands 


20 


Belgium 


20 


Poland 


40 


Canada 


100 


Portugal 


10 


China 


20 


Romania 


20 


Finland 


10 


Russia 


20 


France 


110 


Spain 


40 


Germany 


160 


Switzerland 


10 


Greece 


10 


Taiwan 


10 


Hungary 


20 


US 


240 


Italy 


60 









Fig. 3. Average EH measurement time 



the way that the measurements were done. Because PlanetLab 
offers a set of resources over the Internet that are shared among 
researchers, time measurements can change dramatically from 
one execution to another. Moreover, the measurements we 
made span across 10 days. We have actually seen that in 
one day a given node can be extremely loaded because other 
researchers may also be running experiments, and the next 
day the node can show a minimum load. This is in fact 
the expected behavior of nodes running in a real networking 
environment that greatly differs from the controlled laboratory 
environments. 

The values shown in Fig.[3]include both the connection time 
and the network latency. However, as we can see from Fig. HI 
the latency is only a small part of the measurement time, with 
average values ranging from 68.59ms to 925.86ms. 

The values shown in Fig. [3] clearly show that we should 
improve the performance of the measuring algorithm. At this 
stage, the average time needed to measure the network latency 
for 1000 EH nodes in the 40 OH node setting is 89000ms, 
which corresponds to almost 1.5 minutes. However, this is the 
average time, which is much smaller than the maximum time 
needed for an EH to make the measurements. The maximum 
measurement time is shown in Fig. [5] where we can see that 
the maximum time needed to make the measurements is in 
fact 561192ms, which is almost 9.5 minutes. The values from 
Fig. B] show that the time needed for all nodes to make the 



Fig. 4. Average EH-OH measured latency 



measurements are influenced by the number of OH and by the 
number of EH, leading to the value of 9.5 minutes, which is 
unacceptable. 
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Fig. 5. Maximum EH measurement time 

The total accessing distribution time of EH is also in- 
fluenced by the response time from the MH. In all our 
measurements the MH resides on a single node from Romania. 
In Fig. [6] we can see the average response time from the 
MH. Interestingly, the response time is not influenced by the 
number of OH or by the number of EH, but by the number 
of simultaneous requests that are received. EH nodes connect 
to MH only after completing the measurements, this is why 
when a large number of EH connect simultaneously to the 
MH we get the peaks from the figure. From the measurements 
we have also seen that after receiving the measurement data 
the distribution algorithm is running under 1ms for each 
request, thus the values shown in Fig. [6] are given by message 
processing and network delay. 

After an EH successfully connects to the OH, it can stay 
connected for an unlimited time. However, if the connection 
is interrupted, it will reconnect to the designated OH. If the 
designated OH is no longer available, it must execute the 
measurement and distribution all over again. In case of new EH 
nodes, these are distributed by the MH without redistributing 
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Fig. 6. Average MH response time 



Fig. 7. Average improved EH measurement time for 1000 EH 



the already connected EH nodes. 

As mentioned earlier, in case of OH failure, disconnected 
EH nodes initiate a new measurement and distribution process. 
However, in case of network failures between OH nodes, a 
reconnect mechanism is activated for each OH node that tries 
to re-establish connection with all other OH nodes, effectively 
trying to reconstruct the overlay. 

C. EH Connection Measurement Solutions 

As illustrated in the previous section, making network 
measurements at the application layer is mainly influenced 
by the connection time between nodes. The network latency 
factor, as opposed to the connection time, has a minimum 
impact on the total time. 

When EH use the proposed overlay, their main goal is not 
to make measurements but to actually use it to effectively 
distribute data. The time needed to make the measurements 
should thus be reduced to a minimum possible. 

In this section we propose 3 solutions to the measurement 
problem. After implementing them, we have repeated the 
measurements for the 1000 EH setup, where the modifications 
would have a greater impact. 

The first solution involves reducing the reconnect process 
count to 0, meaning that if a connect attempt fails, the EH 
removes the OH from its list. EH nodes usually try to connect 
over and over again to OH nodes until successful. This process 
dramatically increases the overall measurement time, as shown 
in the previous section. By eliminating the reconnections, we 
are in fact eliminating OH that are overloaded or to which we 
have a poor connection. The improvements can be immediately 
seen, as shown in Fig. 17] In this case, for the maximum setting, 
with 40 OH nodes, the average measurement time drops from 
89000ms to 22027ms, improving the overall measurement 4 
times. 

The problem with the first solution is that a connection must 
be timed out by the OS to eliminate the OH from the solution. 
As a second solution we propose an application-controlled 
connection timeout, opposed to network OS timeout. In this 
case we timed out connections that exceeded 10 seconds, 



TABLE III 

Sub-group partitioning 



Sub-Group 


3 OH 


10 OH 


20 OH 


30 OH 


40 OH 




1 OH/EH 


20H/EH 


40H/EH 


60H/EH 


80H/EH 


Grpl 


333 EH 


200 EH 


200 EH 


200 EH 


200 EH 


Grp2 


333 EH 


200 EH 


200 EH 


200 EH 


200 EH 


Grp3 


333 EH 


200 EH 


200 EH 


200 EH 


200 EH 


Grp4 


- 


200 EH 


200 EH 


200 EH 


200 EH 


Grp5 


- 


200 EH 


200 EH 


200 EH 


200 EH 



decreasing the average measurement time from 89000ms to 
12284ms and improving the overall measurement 7 times, as 
shown in Fig. 17] The 10 seconds were chosen based on the 
observation that a lower timeout leads to an increased number 
of OH nodes eliminated from the solution. This problem is 
discussed in detail later in this section. 

The third solution involves partitioning the OH and EH 
nodes into sub-groups, thus reducing the total number of 
OH/EH and the total number of EH/OH. The partitioning can 
be seen in Table III As we can see from Fig. [7] the average 
time required for measurements is reduced to 6459ms for 40 
OH nodes, improving the overall measurement time over 13 
times. 

The direct effect of the first two solutions is that the number 
of OH nodes for which EH nodes test the connection reduces 
significantly with the reduction of the timeout. For instance, 
by using the OS timeout, which can range from a few seconds 
to a few minutes we have less eliminated OH nodes than using 
a fixed timeout of 10 seconds, as shown in Fig. [8] In case of 
only one connection (i.e. OS timeout) the tested percentage 
is 100% for 3 OH nodes, however, this drops to 95% for 10 
and 20 nodes and then rises to 96.66% for 30 nodes and to 
97.43% for 40 nodes. In case of application-layer timeout we 
have a 98.1% for 3 OH nodes which drops to 71.79% for 40 
OH nodes. 

Although the partitioning-based solution provides the best 
timings, it can limit sub-groups to a set of OH nodes that may 
not provide the optimal solution for the entire group. While 
the application-layer timeout mechanism seems to be the next 
best approach, care must be taken in choosing the timeout 



value because a larger connection-time does not necessarily 
mean that the specific node is heavily loaded, but several other 
factors can also influence this value, such as a momentarily 
busy OS, or a momentarily busy application. 

Other solutions could also be applied, such as using UDP for 
determining the network latency between EH and OH. Such a 
solution would eliminate the overhead given by TCP connec- 
tion. However, because the overlay uses TCP for forwarding 
data, making measurements by connecting to OH nodes via 
TCP provides a more precise view on the future behavior of 
OH nodes. 
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Fig. 8. Average percentage of measured connections 



IV. Conclusions and Future Work 

We presented several issues and solutions for deploying 
application-layer overlay networks. Based on our measure- 
ments conducted over PlanetLab, a real network testing plat- 
form, we have concluded that distributing EH nodes can not 
be based only on the measured network latency, but must 
also include other elements such as connection time or EH 
geographical location to reduce the time required to make the 
actual latency measurements. 

The identified problems have several solutions. In this 
paper we have proposed 3 such solutions: a first one that 
eliminates reconnections, a second one that uses application- 
layer timeouts and a third one that constructs sub-groups for 
reducing the number of OH/EH and EH/OH. By using these 
solutions we have shown that the measurement time can be 
reduced up to 13 times for 1000 EH and 40 OH. 

As future work, we intend to use UDP for the initial 
measurements. However, special care must be taken because 
a lower timing for UDP packages does not necessarily imply 
lower timings for TCP packages. A study must be made to 
determine the correspondence between UDP and TCP timings 
and how could UDP-based measurements be used to forecast 
the overhead introduced by TCP connections. This study must 
also take into consideration UDP packet losses that may also 
influence the total measurement time. 
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